Linear Mixed Model Analysis

Authors

Dr. Muhammad Abdul Hafiz bin Kamarul Zaman

Dr. Tengku Muhammad Hudzaifah bin Tengku Mokhtar

Dr. Muhammad Za’im bin Mohd Samsuri

Published

June 15, 2025

knitr::include_graphics("group.gif")

1 INTRODUCTION

2 DATASET

dataset is taken from . the dataset is adopted from a research that objective is to study the relationship between time spent outdoors and mental wellbeing, across all of Scotland. The researcher contact all the Local Authority Areas (LAAs) and ask them to collect data for them, with participants completing the Warwick-Edinburgh Mental Wellbeing Scale (WEMWBS), a self-report measure of mental health and well-being, and being asked to estimate the average number of hours they spend outdoors each week.

2.1 Variables

the variables for the dataset are:

  1. local authority area (laa): consist of 20 area
  2. mental wellbeing score (wellbeing) : Wellbeing score (Warwick Edinburgh Mental Wellbeing Scale). Range 15 - 75, with higher scores indicating better mental wellbeing
  3. outdoor time (outdoor_time) : Number of hours spent outdoors per week
  4. density (density) : Population density of local authority area (number of people per square km)
  5. gender (gender) : male or female
  6. Participant Identifier (`ppt) : unique ID for each participants

individuals (ppt) are nested within district (laa). This means:

Level 1 is participants (ppt) Level 2 is district (laa)

3 Install and load packages

library(tidyverse)
library(ggplot2)
library(gtsummary)
library(readxl)
library(broom)
library(DT)
library(lme4)
library(kableExtra)
library(lmerTest)
library(lmtest)
library(broom.mixed)

4 Read Data

data1 <- read_excel("D:/R Workspace/correlated numerical assignment/scotlandfinal.xlsx")
View(data1)

5 Data wrangling

glimpse(data1)
Rows: 132
Columns: 9
$ ppt          <chr> "ID1", "ID10", "ID100", "ID101", "ID102", "ID103", "ID104…
$ name         <chr> "Kendall Millar", "Dominik Hill", "Blair Harrison", "Zand…
$ laa          <chr> "West Lothian", "Falkirk", "Falkirk", "Scottish Borders",…
$ outdoor_time <dbl> 20, 23, 29, 21, 10, 19, 13, 21, 16, 12, 22, 13, 26, 17, 2…
$ wellbeing    <dbl> 37, 34, 39, 42, 37, 42, 38, 44, 47, 35, 42, 35, 29, 28, 2…
$ density      <dbl> 421, 526, 526, 29, 19, 13, 31, 536, 480, 258, 44, 536, 42…
$ age          <dbl> 74, 20, 77, 27, 68, 61, 73, 24, 29, 53, 82, 67, 30, 55, 5…
$ BMI          <dbl> 22, 17, 28, 32, 26, 34, 19, 22, 36, 19, 31, 18, 19, 24, 2…
$ gender       <chr> "Male", "Male", "Male", "Male", "Male", "Male", "Male", "…
summary(data1)
     ppt                name               laa             outdoor_time  
 Length:132         Length:132         Length:132         Min.   : 0.00  
 Class :character   Class :character   Class :character   1st Qu.:12.75  
 Mode  :character   Mode  :character   Mode  :character   Median :16.00  
                                                          Mean   :16.92  
                                                          3rd Qu.:21.00  
                                                          Max.   :37.00  
   wellbeing        density            age             BMI      
 Min.   :16.00   Min.   :  10.0   Min.   :18.00   Min.   :17.0  
 1st Qu.:34.00   1st Qu.:  19.0   1st Qu.:32.75   1st Qu.:21.0  
 Median :42.00   Median :  44.0   Median :55.50   Median :26.0  
 Mean   :41.98   Mean   : 477.5   Mean   :54.01   Mean   :26.3  
 3rd Qu.:48.00   3rd Qu.: 491.5   3rd Qu.:70.50   3rd Qu.:32.0  
 Max.   :70.00   Max.   :3581.0   Max.   :85.00   Max.   :36.0  
    gender         
 Length:132        
 Class :character  
 Mode  :character  
                   
                   
                   
boxplot(data1$outdoor_time,
        main = "Boxplot of Outdoor Time",
        ylab = "Outdoor Time",
        col = "lightblue",
        border = "darkblue")

data1 <- data1 %>% 
  mutate(gender = factor(gender, labels = c('male', 'female')))
data1 <- data1 %>% 
  mutate(gender = factor(gender, labels = c('male', 'female')))
data1<-data1 %>% mutate_if(is.character,~ as_factor(.))

6 EDA

summarising the data

data1 %>%
  select(laa, gender, outdoor_time, wellbeing, density) %>%
  tbl_summary()
Characteristic N = 1321
laa
    West Lothian 7 (5.3%)
    Falkirk 5 (3.8%)
    Scottish Borders 6 (4.5%)
    Dumfries and Galloway 6 (4.5%)
    Argyll and Bute 8 (6.1%)
    Perth and Kinross 7 (5.3%)
    East Renfrewshire 6 (4.5%)
    Inverclyde 5 (3.8%)
    Midlothian 6 (4.5%)
    Moray 7 (5.3%)
    West Dunbartonshire 6 (4.5%)
    East Ayrshire 7 (5.3%)
    Shetland Islands 7 (5.3%)
    Stirling 8 (6.1%)
    City of Edinburgh 8 (6.1%)
    Orkney Islands 6 (4.5%)
    Na h-Eileanan Siar 6 (4.5%)
    Glasgow City 8 (6.1%)
    Highland 7 (5.3%)
    Angus 6 (4.5%)
gender
    male 19 (14%)
    female 113 (86%)
outdoor_time 16 (13, 21)
wellbeing 42 (34, 48)
density 44 (19, 503)
1 n (%); Median (Q1, Q3)
data1 %>%
  ggplot(aes(x = outdoor_time, y = wellbeing)) +
  geom_point() +
  geom_smooth(method = lm)

data1 %>%
  ggplot(aes(x = outdoor_time, y = wellbeing, 
             col = gender, gender)) +
  geom_point() +
  geom_smooth(method = lm)

7 Comparing groups with multilevel model

Start with null model (simplest model)

                                      𝑠𝑐𝑜𝑟𝑒𝑖𝑗=𝛽0+𝑢0𝑗+𝑒𝑖𝑗
  1. 𝑠𝑐𝑜𝑟𝑒𝑖𝑗= the score attainment of participants 𝑖 in district𝑗
  2. 𝛽0 = the overall mean wellbeing score across district
  3. 𝑢0𝑗 = the effect of district𝑗on score. This is also level-2 residuals
  4. 𝑒𝑖𝑗 = individual-level residual. This is level-1 residuals

the null model : score

8 Single level analysis

using lm() as the outcome is assume as normally distributed

m.lm <- lm( wellbeing ~ 1, data = data1)
summary(m.lm)

Call:
lm(formula = wellbeing ~ 1, data = data1)

Residuals:
     Min       1Q   Median       3Q      Max 
-25.9848  -7.9848   0.0152   6.0152  28.0152 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  41.9848     0.9955   42.18   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.44 on 131 degrees of freedom

9 Multilevel analysis

We use the lme4 package, by starting with the constant only model (null model) with no explanatory variable or basically a random intercept with constant only model. the model is named as m0, the random effect is due to district.

m0 <- 
  lmer(wellbeing ~ 1 + (1 | laa), 
       data = data1,  REML = FALSE)
summary(m0)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: wellbeing ~ 1 + (1 | laa)
   Data: data1

     AIC      BIC   logLik deviance df.resid 
   881.2    889.8   -437.6    875.2      129 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.14222 -0.61901  0.02757  0.74655  1.95314 

Random effects:
 Groups   Name        Variance Std.Dev.
 laa      (Intercept) 99.92    9.996   
 Residual             27.24    5.219   
Number of obs: 132, groups:  laa, 20

Fixed effects:
            Estimate Std. Error     df t value Pr(>|t|)    
(Intercept)   41.807      2.282 20.045   18.32 5.46e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

based on this model,the overall mean wellbeing score = 41.807. the mean for district j is estimated as 41.807 + 𝑈̂ 0𝑗 where 𝑈̂ 0𝑗 is the district residuals

the ICC is:

99.92   /(99.92  + 27.24)
[1] 0.7857817

78.5% variance of the wellbeing score is attributed to the difference between district

or use tidy() for a proper table

tidy(m0) %>%
  kbl() %>%
  kable_styling()
effect group term estimate std.error statistic df p.value
fixed NA (Intercept) 41.807333 2.281684 18.32302 20.04486 0
ran_pars laa sd__(Intercept) 9.995826 NA NA NA NA
ran_pars Residual sd__Observation 5.218741 NA NA NA NA

difference between multilevel and linear regression model

mlr <- lm(wellbeing ~ 1, data = data1)
logLik(mlr) ; logLik(m0)
'log Lik.' -508.4647 (df=2)
'log Lik.' -437.5814 (df=3)

the diiference is 70.88

anova(m0,mlr)

the comparison is significant, thus the complex model of the the random intercept is the better model.

9.1 Random intercept model

9.1.1 adding the explanatory variable

We will model the effect of a individual-level variable outdoor time in the model

𝑠𝑐𝑜𝑟𝑒𝑖𝑗=𝛽0+𝛽1outdoor_time𝑖𝑗+𝑢0𝑗+𝑒𝑖𝑗

ri <- lmer(wellbeing ~ outdoor_time + (1 | laa), 
           data = data1, 
           REML = FALSE)
summary(ri)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: wellbeing ~ outdoor_time + (1 | laa)
   Data: data1

     AIC      BIC   logLik deviance df.resid 
   874.7    886.2   -433.4    866.7      128 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.2347 -0.7223  0.1226  0.6432  1.8406 

Random effects:
 Groups   Name        Variance Std.Dev.
 laa      (Intercept) 100.71   10.036  
 Residual              25.24    5.024  
Number of obs: 132, groups:  laa, 20

Fixed effects:
              Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)   38.19177    2.59210  32.40431  14.734 6.44e-16 ***
outdoor_time   0.21340    0.07202 114.36148   2.963  0.00371 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
outdoor_tim -0.471

or

tidy(ri, conf.int = TRUE) %>%
  kbl %>%
  kable_styling()
effect group term estimate std.error statistic df p.value conf.low conf.high
fixed NA (Intercept) 38.1917741 2.5920993 14.733916 32.40431 0.0000000 32.9144230 43.4691252
fixed NA outdoor_time 0.2133955 0.0720217 2.962931 114.36148 0.0037078 0.0707258 0.3560651
ran_pars laa sd__(Intercept) 10.0355904 NA NA NA NA NA NA
ran_pars Residual sd__Observation 5.0235615 NA NA NA NA NA NA

the equation for average fitted regression line (across district)

    𝑠𝑐𝑜𝑟𝑒𝑖𝑗=38.1917741 + 0.2133955outdoor_time𝑖𝑗
    

the slope is fixed, while the intercept differs as it depends on the random effect of the district

9.1.2 Prediction

we can predict the wellbeing score based on the mixed model for each individual. the prediction is the average fitted regression plus the district intercept

pred_score <- fitted(ri)
head(pred_score, 10)
       1        2        3        4        5        6        7        8 
32.37244 31.74239 33.02276 40.05832 37.41243 43.90313 46.05396 44.41006 
       9       10 
42.70001 33.07180 

there will be 20 random effect because there were 20 district, and the random effect of each hospital as below

rand_ef <- ranef(ri)
head(rand_ef$laa, 20)

to get the fitted values

ri_fitted <- augment(ri)
ri_fitted %>% 
  slice(1:12)
ri_fitted %>% 
  slice(42:52)

confirmation with manual calculation

using the first observation:

  1. intercept: 38.1917741
  2. outdoor time : 20 minutes : 0.2133955(minutes)
  3. Community: West Lothian : -10.0872472
38.1917741-10.0872472   +(0.2133955*20)
[1] 32.37244

the value from the table and manual calculation is similar which is 32.37244

9.1.3 Plot

ggplot(ri_fitted, aes(outdoor_time, .fitted, group = laa )) +
  geom_point(alpha = 0.3) +
  geom_line(alpha = 0.3) +
  ylab('fitted wellbeing score') +
  xlab('outdorr_time') +
  ggtitle('The fitted value for random intercept model with outdoor time ') +
  theme_bw()

9.1.4 Variance

9.1.5 variance between district

in the constant only model the variance is 99.92 then the variance slightly inncrease after adding outdoor_time where model with outdoor_time as the explanatory variable now has the variance of 100.71 After accounting for coutdoor_time effects, the proportion of unexplained variance that is due to differences between district increases slightly to 100.71/(100.71+25.24)=80% .

100.71/(100.71+25.24)
[1] 0.799603

9.1.6 within district variance

  1. constant only model variance is 27.24
  2. reduction of variance after adding outdoor_time
  3. model with outdoor_time as the explanatory variable 25.24

we can see that the addition of outdoor_time has increases the amount of variance at the district but not at the individual level. The between-district variance has increases from 99.92 to 100.71, and the within-school variance has reduced from 27.24 to 25.24.

10 Random slope model

now extend the random intercept model fitted before to allow both the intercept and the slope to vary randomly across district.

10.1 model

Wellbeing 𝑠𝑐𝑜𝑟𝑒𝑖𝑗=𝛽0+𝛽1outdoor_time𝑖𝑗+𝑢0𝑗+𝑢1𝑗outdoor_time𝑖𝑗+𝑒𝑖𝑗
rs <- lmer(wellbeing ~ outdoor_time + (1 + outdoor_time | laa), 
           data = data1, REML = FALSE)
summary(rs)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: wellbeing ~ outdoor_time + (1 + outdoor_time | laa)
   Data: data1

     AIC      BIC   logLik deviance df.resid 
   866.6    883.9   -427.3    854.6      126 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.26758 -0.59753  0.02406  0.62356  1.95808 

Random effects:
 Groups   Name         Variance Std.Dev. Corr
 laa      (Intercept)  44.2585  6.6527       
          outdoor_time  0.0891  0.2985   0.38
 Residual              21.5661  4.6439       
Number of obs: 132, groups:  laa, 20

Fixed effects:
             Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  38.27352    1.93169 19.04358   19.81  3.6e-14 ***
outdoor_time  0.21161    0.09662 13.78158    2.19   0.0462 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
outdoor_tim -0.231
rs <- lmer(wellbeing ~ outdoor_time + (1 + outdoor_time | laa), data = data1, control = lmerControl(optimizer = 'bobyqa'),
           REML = FALSE)
summary(rs)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: wellbeing ~ outdoor_time + (1 + outdoor_time | laa)
   Data: data1
Control: lmerControl(optimizer = "bobyqa")

     AIC      BIC   logLik deviance df.resid 
   866.6    883.9   -427.3    854.6      126 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.2676 -0.5977  0.0241  0.6236  1.9581 

Random effects:
 Groups   Name         Variance Std.Dev. Corr
 laa      (Intercept)  44.25665 6.6526       
          outdoor_time  0.08908 0.2985   0.38
 Residual              21.56642 4.6440       
Number of obs: 132, groups:  laa, 20

Fixed effects:
             Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  38.27359    1.93167 19.04249   19.81  3.6e-14 ***
outdoor_time  0.21161    0.09661 13.78262    2.19   0.0462 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
outdoor_tim -0.231

nicer output

tidy(rs) %>% kbl() %>%
  kable_styling()
effect group term estimate std.error statistic df p.value
fixed NA (Intercept) 38.2735933 1.931666 19.813768 19.04249 0.0000000
fixed NA outdoor_time 0.2116052 0.096614 2.190214 13.78262 0.0462223
ran_pars laa sd__(Intercept) 6.6525669 NA NA NA NA
ran_pars laa cor__(Intercept).outdoor_time 0.3800737 NA NA NA NA
ran_pars laa sd__outdoor_time 0.2984660 NA NA NA NA
ran_pars Residual sd__Observation 4.6439655 NA NA NA NA

10.2 The fitted values

rs_res <- augment(rs)
head(rs_res, 20)

The outdoor effect for district𝑗is estimated as 0.2116051+𝑢̂ 𝑖𝑗, and the between-school variance in these slopes is estimated as 0.09.

That means for the average district we predict an increase of 0.2116051 points in the wellbeing score for every increase in 1 hour of outdoor time.

The intercept variance of 38.2735950 is interpreted as the between-district variance when outdoor_time=0

The intercept-slope correlation is estimated as 0.38 which means that district with a high intercept tend to have a steeper-than-average slope. This suggests that local authorities (laa) with higher baseline levels of wellbeing tend to show a stronger positive effect of outdoor_time on wellbeing. In other words, the more time people spend outdoors in districts that already have high average wellbeing, the greater the marginal gain in wellbeing from additional outdoor time.

10.3 Comparing models between random intercept and random slope

anova(ri, rs)

There is very strong evidence that the outdoor time effect differs across district

10.4 Interpretation of random effects across district

The outdoor time effect for district 𝑗 is estimated as 0.21161+𝑈̂ 1𝑗, and the between-school variance in these slopes is estimated as 0.09.

For the average district we predict an increase of 0.21161 points in the wellbeing score for each hour of outdoor time. A 95% coverage interval for the school slopes is estimated as 0.21161±1.96√0.09 =−0.37639 to 0.79961.

Thus, assuming a normal distribution, we would expect the middle 95% of district to have a slope between −0.37639 and 0.79961.

ra.eff.rs <- ranef(rs, condVar = TRUE)
datatable(ra.eff.rs$laa)
plot(ra.eff.rs)
$laa

0 is equal to mean outdoor time of 16 hours per week

ra.eff.rs.sc <- ra.eff.rs$laa
names(ra.eff.rs.sc)
[1] "(Intercept)"  "outdoor_time"
ra.eff.rs.sc <- ra.eff.rs.sc %>%
  rename(rs_slope = outdoor_time, rs_int = "(Intercept)")

ra.eff.rs.sc %>% 
ggplot(aes( x = rs_int, y = rs_slope)) + 
  geom_point() +
  geom_vline(xintercept = 0) + 
  geom_hline(yintercept = 0)

This plot illustrates a positive correlation between intercepts and slopes:

districts with higher baseline wellbeing (positive random intercepts) tend to have stronger positive effects of outdoor_time on wellbeing (higher slopes).

Conversely, districts with lower-than-average baseline wellbeing (negative intercepts) tend to have weaker or negative relationships between outdoor_time and wellbeing.

#MODEL

expression(hat(wellbeingscore)[ij] == 
           (38.2735950 + hat(u)[0 * j]) + 
           (0.21161 + hat(u)[1 * j]) * outdoor_time[ij])
expression(hat(wellbeingscore)[ij] == (38.273595 + hat(u)[0 * 
    j]) + (0.21161 + hat(u)[1 * j]) * outdoor_time[ij])
plot(1, 1, type = "n", xlab = "", ylab = "", axes = FALSE)
text(1, 1,
     expression(hat(wellbeingscore)[ij] == 
                (38.2735950 + hat(u)[0 * j]) + 
                (0.21161 + hat(u)[1 * j]) * outdoor_time[ij]),
     cex = 1.2)

datatable(rs_res)

manual calculation :

using the first observation:

  1. intercept: 38.2735950
  2. outdoor time : 20 minutes : 0.2116051(minutes)
  3. Community: West Lothian : -3.57859346508806
  4. outdoor time slope for west lothian :-0.321525576968577
38.2735950-3.57859346508806+((0.2116051-0.321525576968577)*20)
[1] 32.49659

similar to the fitted value given in the table above: 32.49659260739616

#Adding a level 1 variable to the random slope model

rs_gen <- lmer(wellbeing ~ outdoor_time + gender + (1 + outdoor_time | laa),
                data = data1, REML = FALSE,
                lmerControl(optimizer = 'bobyqa'))
summary(rs_gen)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: wellbeing ~ outdoor_time + gender + (1 + outdoor_time | laa)
   Data: data1
Control: lmerControl(optimizer = "bobyqa")

     AIC      BIC   logLik deviance df.resid 
   867.7    887.9   -426.8    853.7      125 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.21776 -0.61231  0.02179  0.64354  1.95652 

Random effects:
 Groups   Name         Variance Std.Dev. Corr
 laa      (Intercept)  43.74062 6.6137       
          outdoor_time  0.07373 0.2715   0.51
 Residual              21.69867 4.6582       
Number of obs: 132, groups:  laa, 20

Fixed effects:
              Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)   37.17430    2.24156  32.86510  16.584   <2e-16 ***
outdoor_time   0.21329    0.09234  12.75828   2.310   0.0383 *  
genderfemale   1.24578    1.26099 110.95959   0.988   0.3253    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) otdr_t
outdoor_tim -0.191       
genderfemal -0.515  0.047

11 adding level2 explanatory variables

rs_den <- lmer(wellbeing ~ outdoor_time + density + (1 + outdoor_time | laa),
                data = data1, REML = FALSE,
                lmerControl(optimizer = 'bobyqa'))
summary(rs_den)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: wellbeing ~ outdoor_time + density + (1 + outdoor_time | laa)
   Data: data1
Control: lmerControl(optimizer = "bobyqa")

     AIC      BIC   logLik deviance df.resid 
   867.7    887.9   -426.9    853.7      125 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.26090 -0.56573 -0.00021  0.62588  2.03823 

Random effects:
 Groups   Name         Variance Std.Dev. Corr
 laa      (Intercept)  43.9730  6.6312       
          outdoor_time  0.0945  0.3074   0.24
 Residual              21.4563  4.6321       
Number of obs: 132, groups:  laa, 20

Fixed effects:
              Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)  39.171950   2.157210 19.381769  18.159 1.25e-13 ***
outdoor_time  0.211476   0.098261 13.934109   2.152   0.0494 *  
density      -0.002048   0.002100 16.865581  -0.975   0.3433    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) otdr_t
outdoor_tim -0.283       
density     -0.446  0.028
anova(rs,rs_gen)
anova(rs,rs_den)

gender is not a significant covariates when included in the model.besides, model comparison also showed no significant difference between rs_den and rs_gen with the rs model.thus, the simpler model is chhosen. we remains with model with random slope for outdoor activity only

12 INTERACTION

as there is only one variable significant, no interaction checking needed

13 CHECKING FOR ASSUMPTION

Plot random effect

library(lattice)
randoms <- ranef(rs, condVar = TRUE)
dotplot(randoms)
$laa

This plot above display the random effects estimates for different local authority areas “laa” in Scotland. The left side shows the variation in baseline levels of the outcome “wellbeing” score across different local authorities. Areas like Na h-Eileanan Siar and City of Edinburgh have higher-than-average baseline scores, while places such as Glasgow City and Falkirk fall below the overall mean, indicating lower baseline levels of the outcome.

The right panel, the “outdoor_time” illustrates the estimated random slopes for the effect of outdoor time across each local authority. Here, the estimates are clustered around zero, suggesting that the effect of outdoor time on the outcome is relatively uniform across regions or that the model has shrunk the estimates toward the overall fixed slope due to limited group-level variation. This indicates that while baseline outcomes vary substantially between areas, the influence of outdoor time on the outcome does not differ much by location.

plot(rs)

the residual appear randomly scattered near the zero line .the assumption of homocedasticity was met

qqmath(rs)

the points majority clustered on the line, so the residual is normally distributed. the assumption was met.

14 References

  1. Practical Linear Mixed Models. Kamarul Imran Musa. 25 March 20222.
  2. https://uoepsy.github.io/lmm/00_datasets.html